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1. Abstract 

We consider a sequence of repeated interactions between an agent and an envi- 
ronment. Uncertainty about the environment is captured by a probability distri- 
bution over a space of hypotheses, which includes all computable functions. Given 
a utility function, we can evaluate the expected utility of any computational policy 
for interaction with the environment. After making some plausible assumptions 
(and maybe one not-so-plausiblc assumption), we show that if the utility function 
is unbounded, then the expected utility of any policy is undefined. 

2. AI Formalism 

We will assume that the interaction between the agent and the environment 
takes place in discrete time-steps, or cycles. In cycle n, the agent outputs an action 
y n G Y, and the environment inputs to the agent a perception x n <E X. Y and X 
are the sets of possible actions and perceptions, respectively, and are considered as 
subsets of N. Thus the story of all interaction between agent and environment is 
captured by the two sequences x = (x\, X2, ■ ■ ■ ) and y = (j/i, y 2 , ■ ■ ■ )■ 

Let us introduce a notation for substrings. If s is a sequence or string, and 
{a, b} C N, a < 6, then define s b a — (s a , s a+ \, . . . Sb). 

We will denote the function instantiated by the environment as Q : Y* — > X, 
so that Vn <E N, x n — <5(j/™). This means that the perception generated by the 
environment at any given cycle is determined by the agent's actions on that and 
all previous cycles. 

A policy for the agent is a function p : (Y* x X*) — ► Y, so that an agent 
implementing p at time n will choose an action y n — p , x™~ ) . 

If, at any time, an agent adopts some policy p, and continues to follow that 
policy forever, then p and Q taken together completely determine the future of the 
sequences (x n ) and (y„). We are particularly interested in the future sequence of 
perceptions, so we will define a future junction ^ (Q,p, y™, a;" ) = x™ +1 . 

Because the precise nature of the environment Q is unknown to the agent, we 
will let Q, be the set of possible environments. Let F be a cr-algebra on 0, and 
P : F — > [0, 1] be a probability measure on F which represents the agent's prior 
information about the environment. 

We will also define a function T q : Y* — > X* which represents the perception 
string output by environment q given some action string. Let 
T q (s) = ( q {s{),q(sl),..., q (s)). 

The agent will compare the quality of different outcomes using a utility function 
U : X N — > M. We can then judge a policy by calculating the expected utility of the 
outcome given that policy, which can be written as 

(1) E(U(x?HF(Q,p,y?,x?))\r Q {y?) = x?) 

...where Q is being treated as a random variable. When we write a string next to 
a sequence, as in x™^ (Q,p, y™, x™), we mean to concatenate them. Here, x" repre- 
sents what the agent has seen in the past, and "J" (Q,p, y™, x") represents something 
the agent may see in the future. By concatenating them, we get a complete sequence 
of perceptions, which is the input required by the utility function U. 

Notice that the expected utility above is a conditional expectation. Except 
on the very first time-step, the agent will already have some knowledge about 
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the environment. After n cycles, the agent has output the string y™, and the 
environment has output the string x\. Thus the agent's knowledge is given by the 
equation Tq (y™) = x™. 

Agents such as AIXI (Hutter, 2007) choose actions by comparing the expected 
utility of different policies. Thus we will focus, in this paper, on calculating the 
expected utility of a policy. 

3. Assumptions about the Hypothesis Space 

Here we'll make further assumptions about the hypothesis space (17, F, P). While 
we could succinctly make strong assumptions that would justify our central claim, 
we will instead try to give somewhat weaker assumptions, even at the loss of some 
brevity. 

Let 17c be the set of computable total functions mapping Y* to X. We will 
assume that 17 D 17 c and that (Vg £ 17c) : {<7} £ F and P({q}) > 0. Thus we 
assume that the agent assigns a nonzero probability to any computable environment 
function. 

Let 17p be the set of computable partial functions from Y* to X. Then 17c C 17p. 
The computable partial functions 17p can be indexed by natural numbers, using 
a surjective computable index function <j) : N — > 17p. Since the codomain of </> 
is a set of partial functions, it may be unclear what we mean when we say that 
(j) is computable. We mean that (i,s) — ► (</>({)) (s), whose codomain is X, is a 
computable partial function. We will also use the notation fa = <j) (i). 

We'll now assume that there exists a computable total function p : N — > Q such 
that if fa £ 17c, then < p (i) < P ({<p (*)})• Intuitively, we are saying that is a 
way of describing computable functions using some sort of language, and that p is 
a way of specifying lower bounds on probabilities based on these descriptions. Note 
that we make no assumption about p (i) when fa 17c- 

To see an example of a hypothesis space satisfying all of our assumptions, let 
17 = 17c, let F = 2° p , let <j) be any programming language, and let p (i) = 2~\ Let 

(2) °= E 

and for any lo s 17, let 

(3) P ({«>}) = ^ E 

4. Assumptions about the Utility Function 

Perhaps the most philosophically questionable assumption in this paper has al- 
ready been made in defining the domain of the utility function U as A N , the set 
of perception-sequences. This is like assuming that a person cares not about his 
or her family and friends, but about his or her perception of his or her family and 
friends. 

Since the utility function U : X^ — > R takes as its argument an infinite sequence, 
we must discuss what it means for such a function to be computable. Obviously any 
computation which terminates can only look at a finite number of terms. Therefore 
we will try to approximate U (x) using prefixes of x. We say that U is computable 
if there exist computable functions Ul,Uu : X* — > Q U {— oo,+oo} such that, if 
x e X N and x € X* and iCi, then: 
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• U L (x) < U(x) < Uu (x) 

• Ul (x) — ► U (x) and Uu [x) —> U (x) as x — * x. 

In any case, we will not assume that U is computable, because we do not need 
such a strong assumption to prove our claims. Instead we will define two possible 
conditions. 

Definition 1. Let D C X N and let U : D -> R. Let D'P = {s G X*\ (3d G £>) : s C 
c?} . T/ien [/ is computably unbounded from above on D if there exists a computable 
partial function Ul ■ D p — > Z suc/i i/iat: 

• (Vd G D) (Vs eDP): if s\= d, and if U L (s) exists, then U L (s) <U{d). 

• (Vm G Z) (3s G DP) : U L (s) > m. 

U is computably unbounded from below if — [/ is computably unbounded from 
above. 

Note in particular that any computable function on X n which is unbounded from 
above is computably unbounded from above, and any computable function which 
is unbounded from below is computably unbounded from below. 

The following lemma will help us find environments which generate large amounts 
of utility. When considering / in the lemma, think of Ul above. 

Lemma 1. Suppose C C X* , and f : C ^ Z is a computable partial function such 
that (Vm G Z) (3c G C) : f (c) > to. Then there exists a computable total function 
H : Z -> C such that, (Vm G Z) : / o H (to) > m. 

In other words, given an unbounded partial function /, there is a computable 
function H which finds an input on which / will exceed any given bound. 

Proof. First we'll index C; let C = {ci, C2, . . . }. 

If / were a total function, we could simply let H (m) = c min { ieN: j( c .) >m j.. We 
would compute this by first computing /(ci), then /(C2), etc. Unfortunately we 
only have that / is a partial function, so we can not proceed in this way. 

Instead, we'll note that for any input on which / halts, it must halt in a specific 
number of steps. The Cantor pairing function 7r : N x N — > N, 7r (fci, k^) = \{k\ + 
kz)(ki + &2 + 1) + &2 is a bijection, so we can use 7r _1 to index all pairs of natural 
numbers. Then we can simulate / on every possible input for every number of 
steps, which will allow us to evaluate / on every input for which / halts. 

def H(m): 

for n in (1 , 2, 3, . . . ) : 
let (t, i) = pi"-l(n) 
simulate f(c_i) for t steps 

... if it does not finish: 

do nothing. 
... if it does finish: 
if f (c_i) >= m: 
return c_i 



Then H is a computable total function and / o H (m) > m. 



□ 
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5. Results 

Let R be the set of all computable partial functions mapping N to N, and let 
9 : N — > R be a computable index (analogous to our other index function <fi). 
Let 

(4) B{n) = max6» fc (0) 

Lemma 2. Let f G R be a total function. Then B(n) > f(n) infinitely often. 

Proof. Suppose not. Then B(n) > f(n) only finitely many times, so there exists 
some c G N such that (Vn G N) : f(n) + c > B(n). 

Let C(n, m) = f(n) + c. By a corollary of the Recursion Theorem, there exists 
m G N such that (Vn G N) : 9 m (n) = C(m, n) = f(m) + c. 

By definition, B(m) > 9 m (0) = C(m,0) = f{m) + c> B(m). So B(m) > B(m), 
which is a contradiction. □ 

Now suppose that at time n + 1, the agent has already taken actions y™ an d 
made observations cc", and is considering the expected utility of policy p. Let 
D = {seI M : s? = x?}. 

Theorem 1. If U is computably unbounded from above on D, then 
E(U (xi 1 ^ (Q,p,yi,Xi))\TQ(yi) — x") is either undefined or +oo. 

Proof. Let Ul ■ D p ™> Z be as in definition [TJ Then by Lemma [1] there exists 
H : Z -> DP such that (Vm G Z) : U L {H{m)) > m. 

H here is intended to be used to construct sequences with high utility. Since 
H outputs a string rather than a sequence, we will pad it to get a sequence. Let 
c G X be some arbitrary word in the perception alphabet. Then let H : Z — * D, 
where H(n) is a sequence beginning with H(n), followed by c, c, c, . . . . 

For brevity, let W p (q) = x™^ (q,p, , a;™). W p (q) represents the complete se- 
quence of perceptions received by the agent, assuming that it continues to imple- 
ment policy p in environment q. 

We will now break up the expected utility into two terms, depending on whether 
or not Q G flc- 

E{U{W p {Q))\T Q {yV = x n 1 ) 
= P(Q G n c )E (U (W p (Q)) |r Q ( yi ) = a:?, Q G Q c ) 
+P{Q £ n c )E (U (W p (Q)) \T Q {y?) = Q ^ n c ) 
= U(W p (q))P({q}\r Q (y?) = xV 

+P(Q £ Sl c )E{U{W p (Q)) |r (y?) =x n l ,Qt ^c) 
We will show that the series: 

U(W p (q))P({q}\T Q (y?)=xV 

has infinitely many terms > 1. We will do this by finding a sequence of envi- 
ronments whose utilities grows very quickly - more quickly than their probabilities 
can shrink. 
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By equation 01 for each j 6 N there exists Uj G N such that Uj < j and U . (0) = 
B(j). 

Now we define a map on function indices G : N — > N such that: 

0G(n) (7) = ff(»n(0))| 7 | 

So G takes the (9-index of an N — » N function (say, g), and returns the 4>- 
index of an environment which is compatible with all the data so far, and which 
is guaranteed to produce utility greater than g(0). We can assume that G is a 
computable function. 

So our sequence of environments will be {(f> G ( Uj )} 

Then U (W p (4> aM )) > B(j). Now let 

(5) P(j) = [max 1 1 

k<j P{G{k)) 

Then p is a computable, nondecreasing function. Since p is computable, B(j) > p(j) 
infinitely often. Since Uj < j, then by definition, p(j) > p(G( 1 Mj)) > p^^— jy- 

P(T Q (y?) = x?\Q = G(tlj) ) = 1, so by Bayes' Rule, P({i G{Uj) }\T Q (y^) = 
Xx) > P({4>G(u ■)})■ Since both sides are positive, we take the reciprocal to get 
P(Ua^m^) ^ PiUa^}) - % transitivity, U (W p {<j> G(U]) )) > ^^—^^-^ 
infinitely often, so U (W p (4> G ( Uj) )) P{{4> G ( U] )}\T Q (y^) = xf) > 1 infinitely often. 
Since the series contains infinitely many terms > 1, its limit is either +oo or nonex- 
istent. □ 

Corollary 1. If U is computably unbounded from below on D, then 
E (U (xi^ (Q,p, Ux, x%)) \^q(Vi) = %i) is either undefined or — oo. 

Proof. By definition, — U is computably unbounded from above. Thus, by the- 
orem [U E (-U (xi^S {Q,p, J/", a;")) l^QiVi) — x i) is either undefined or +oo. So 
E (U (aff* (Q,p, yf, x?)) |r Q (tf )) - -E (-U (sf* (Q,p, yf, x?)) \T Q (y^) = x?) is 
either undefined or — oo. □ 

Corollary 2. If U is computably unbounded from both below and above on D, then 
E{U{x^{Q,p^,x1))\T Q {yl)=x1) is undefined. 

Proof. By theoremHJ E (U (x™^ (Q,p, y™ j x")) \^q(Vi) = x") is either undefined or 
+oo. By corollary [TJ it is either undefined or — oo. Thus it is undefined. □ 

6. Discussion 

Our main result implies that if you have an unbounded, perception determined, 
computable utility function, and you use a Solomonoff-like prior (Solomonoff, 1964), 
then you have no way to choose between policies using expected utility. So which 
of these things should we change? 

We could use a non-perception determined utility function. Then our main result 
would not apply. In this case, the existence of bounded expected utility will depend 
on the utility function. It may be possible to generalize our argument to some larger 
class of utility functions which have a different domain. 

We could use an uncomputable utility function. For instance, if the utility of 
any perception-sequence is defined as equal to its Kolmogorov complexity, then the 
utility function is unbounded but the expected utility of any policy is finite. 
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We could use a smaller hypothesis space; perhaps not all computable environ- 
ments should be considered. 

The simplest approach may be to use a bounded utility function. Then conver- 
gence is guaranteed. 
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